Back

Expert Systems with Applications

Elsevier BV

Preprints posted in the last 30 days, ranked by how well they match Expert Systems with Applications's content profile, based on 11 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Multi-task deep learning integrating pretreatment MRI and whole slide images predicts induction chemotherapy response and survival in locally advanced nasopharyngeal carcinoma

Hou, J.; Yi, X.; Li, C.; Li, J.; Cao, H.; Lu, Q.; Yu, X.

2026-04-11 radiology and imaging 10.64898/2026.04.07.26350350 medRxiv
Top 0.1%
3.6%
Show abstract

Predicting response to induction chemotherapy (IC) and overall survival (OS) is critical for optimizing treatment in patients with locally advanced nasopharyngeal carcinoma (LANPC). This study aimed to develop and validate a multi-task deep learning model integrating pretreatment MRI and whole slide images (WSIs) to predict IC response and OS in LANPC. Pretreatment MRI and WSIs from 404 patients with LANPC were retrospectively collected to construct a multi-task model (MoEMIL) for the simultaneous prediction of early IC response and OS. MoEMIL employed multi-instance learning to process WSIs, PyRadiomics and a convolutional neural network (ResNet50) to extract MRI features, and fused multimodal features through a multi-gate mixture-of-experts architecture. Clustering-constrained attention multiple instance learning and gradient-weighted class activation mapping were applied for visualization and interpretation. MoEMIL effectively stratified patients into good and poor IC response groups, achieving areas under the curve of 0.917, 0.869, and 0.801 in the train, validation, and test sets, respectively, and outperformed the deep learning radiomics model, the pathomics model and TNM staging. The model also stratified patients into high- and low-risk OS groups (P < 0.05). MoEMIL shows promise as a decision-support tool for early IC response prediction and prognostication in LANPC. Author SummaryWe have developed a deep learning model that integrates two types of medical images, including magnetic resonance imaging (MRI) and digital pathological slices, to simultaneously predict response to induction chemotherapy and prognosis in patients with locally advanced nasopharyngeal carcinoma. Current treatment decisions primarily rely on traditional tumor staging (TNM), which often fails to comprehensively reflect the complexity of the disease. Our model, named MoEMIL, was trained and tested on data from 404 patients across two hospitals and consistently outperformed both single-model approaches and TNM staging methods. By identifying patients who exhibit poor response to induction chemotherapy or higher prognostic risk, our tool can assist clinicians in achieving personalized treatment, enabling intensified management for high-risk patients and avoiding unnecessary side effects for low-risk patients. Additionally, we visualize the models reasoning process through heat map generation, which highlights the image regions exerting the greatest influence on prediction outcomes. This work represents a step toward more precise treatment for nasopharyngeal carcinoma; however, larger-scale prospective studies are required before the model can be integrated into routine clinical practice.

2
The false positive paradox: Examining real-world clinical predictive performance of FDA-authorized AI devices for radiology using clinical prevalence

Sparnon, E.; Stevens, K.; Song, E.; Harris, R. J.; Strong, B. W.; Bruno, M. A.; Baird, G. L.

2026-03-27 radiology and imaging 10.64898/2026.03.25.26349197 medRxiv
Top 0.1%
1.7%
Show abstract

The present study evaluates the real-world clinical predictive performance of FDA-authorized artificial intelligence (AI) devices used in radiology, focusing on the false positive paradox (FPP) and its implications for clinical practice. To do this, we analyzed publicly available FDA data on AI radiology devices from 2024 and 2025 from 510(k) summaries, demonstrating how diagnostic accuracy metrics like sensitivity and specificity do not necessarily translate into high positive predictive value (PPV) due to the influence of target disease prevalence. We show the importance of disclosing the false discovery (FDR) and false omission rates (FOR) and argue that this transparency enables clinicians to select AI systems that balance false positive and false negative costs in a clinically, ethically, and financially appropriate manner. Finally, we provide recommendations for what data should be provided to best serve practices and radiologists.

3
A Deployable Explainable Deep Learning System for Tuberculosis Detection from Chest X-Rays in Resource-Constrained High-Burden Settings

Agumba, J.; Erick, S.; Pembere, A.; Nyongesa, J.

2026-04-01 radiology and imaging 10.64898/2026.03.31.26349662 medRxiv
Top 0.2%
1.4%
Show abstract

Abstract Objectives: To develop and evaluate a deployable deep learning system with Gradient-weighted Class Activation Mapping (Grad-CAM) for tuberculosis screening from chest radiographs and to assess its classification performance and explainability across desktop and mobile deployment platforms. Materials and methods: This study used publicly available chest X-ray datasets containing Normal and Tuberculosis images. A DenseNet121-based transfer learning model was trained using stratified training, validation, and test splits with data augmentation and class weighting. Model performance was evaluated using accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC). Grad-CAM was used to visualize regions influencing model predictions. The trained model was converted to TensorFlow Lite and deployed in both a Windows desktop application and a Flutter-based mobile application for offline inference and visualization. Results: The model demonstrated strong classification performance on the independent test dataset, with high accuracy and AUC values indicating effective discrimination between Normal and Tuberculosis cases. Grad-CAM visualizations showed that the model focused primarily on anatomically relevant lung regions, particularly the upper and mid-lung fields in Tuberculosis cases. Deployment testing confirmed consistent prediction outputs and Grad-CAM visualizations across both Windows and mobile platforms. Conclusion: The proposed deployable deep learning system with Grad-CAM provides accurate and interpretable tuberculosis screening from chest radiographs and demonstrates feasibility for offline mobile and desktop deployment. This approach has potential as an artificial intelligence-assisted screening and decision support tool in radiology, particularly in resource-limited and remote healthcare settings.

4
Climate-Informed Deep Learning for Spatio-Temporal Forecasting of Climate-Sensitive Diseases

Tegenaw, G. S.; Degu, M. Z.; Gebeyehu, W. B.; Senay, A. B.; Krishnamoorthy, J.; Ward, T.; Simegn, G. L.

2026-03-24 public and global health 10.64898/2026.03.20.26348930 medRxiv
Top 0.2%
1.3%
Show abstract

Effective public health planning and intervention strategies necessitate an understanding of the temporal and geographic distribution of disease incidences. This requires robust frameworks for disease incidence forecasting. However, due to variations in cases and temporal dynamics, grasping the distinct patterns of climate-sensitive diseases poses significant challenges, including identifying hotspots, trends, and seasonal variations in disease incidence. Furthermore, although most studies focus on directly predicting future incidence using historical patterns and covariates, a significant gap remains between methodological proliferation marked by diverse architectures, where models are trained and validated on benchmark datasets that are standardized and statistically stable, and epidemiological reality, which is often characterized by irregular, sparse, and highly skewed data, as well as rare but high-magnitude or bimodally distributed incidences. Hence, traditional end-to-end approaches that directly map climate and disease data often fail in these data-scarce settings due to overfitting and poor generalization. To understand disease epidemiology and mitigate the impact of incidence, we analyzed a decade of retrospective datasets in Ethiopia to examine how climate and weather conditions influence the incidence or spread of climate-sensitive diseases, including malaria and dysentery. In this study, we proposed a two-stage hybrid framework, a climate-informed disease prediction model, to forecast the likelihood of disease incidences using decades of climate and weather data. First, deep learning was applied to capture latent weather dynamics. Then, a hurdle model using Extreme Gradient Boosting (XGB) was designed for zero-inflated incidence data, combining XGBClassifier to predict incidence and XGBRegressor to estimate its size, based on weather dynamics to forecast disease incidence. Our proposed multivariate climate-driven disease incidence model incorporates both spatial (elevation, coordinates) and temporal (year, month) factors, along with key weather parameters (precipitation, sunlight, wind, relative humidity, temperature) to predict the likelihood of multiple diseases occurring in each area, serving as a foundation for future disease incidence predictions in the region. Out of 72 evaluated experiments across four categories and six targets, we found that the Transformer model showed highest number of statistically significant wins (n=18, 25.0%) comparison with Long Short-Term Memory (LSTM) (n=9, 12.5%) and the Temporal Convolutional Neural Network (TCN) (n=5, 6.9%) at climate variable forecasting using Pairwise Model Comparison Diebold-Mariano Test. The hurdle model that combines XGBClassifier and XGBRegressor outperformed the baseline in both Malaria and Dysentery forecasting. Error stratification revealed that the hurdle model provided the greatest benefit during incidence periods, as indicated by a substantially lower Mean Average Error (MAE) in both incidence and non-incidence periods than the baseline. Our proposed modular pipeline first forecasts climate variables, then predicts disease incidence, thereby enhancing interpretability and generalization in data-sparse settings. Overall, this approach provides a scalable, climate-aware forecasting tool for public health planning, particularly in regions where these diseases are endemic or where climate change may affect their prevalence, as well as in data-scarce settings.

5
HybridNet-XR: Efficient Teacher-Free Self-Supervised Learning for Autonomous Medical Diagnostic Systems in Resource-Constrained Environments.

Mayala, S.; Mzurikwao, D.; Suluba, E.

2026-03-19 health informatics 10.64898/2026.03.16.26348570 medRxiv
Top 0.2%
1.3%
Show abstract

Deep learning model classification on large datasets is often limited in countries with restricted computational resources. While transfer learning can offset these limitations, standard architectures often maintain a high memory footprint. This study introduces HybridNet-XR, a memory-efficient and computationally lightweight hybrid convolutional neural network (CNN) designed to bridge the domain gap in medical radiography using autonomous self-supervised learning protocols. The HybridNet-XR architecture integrates depthwise separable convolutions for parameter reduction, residual connections for gradient stability, and aggressive early downsampling to minimize the video RAM (VRAM) footprint. We evaluated several training paradigms, including teacher-free self-supervised learning (SSL-SimCLR), teacher-led knowledge distillation (KD), and domain-gap (DG) adaptation. Each variant was pre-trained on ImageNet-1k subsets and fine-tuned on the ChestX6 multi-class dataset. Model interpretability was validated through gradient-weighted class activation mapping (Grad-CAM). The performance frontier analysis identified the HybridNet-XR-150-PW (Pre-warmed) as the optimal configuration, achieving a 93.38% average accuracy and 99% AUC while utilizing only 814.80 MB of VRAM. Regarding class-wise accuracy, this variant significantly outperformed standard MobileNetV2 and teacher-led models in critical diagnostic categories, notably Covid-19 (97.98%) and Emphysema (96.80%). Grad-CAM visualizations confirmed that the teacher-free pre-warming phase allows the model to develop sharper, anatomically grounded focus on pathological landmarks compared to distilled models. Specialized pre-warming schedules offer a viable, computationally autonomous alternative to knowledge distillation for medical imaging. By eliminating the requirement for high-performance teacher models, HybridNet-XR provides a robust and trustworthy diagnostic foundation suitable for clinical deployment in resource-constrained environments. Author summaryTraditional deep learning models for medical imaging are often too large for the low-power computers available in many global health settings. We developed a new model to bridge this computational gap. We designed HybridNet-XR, a highly efficient AI architecture, and trained it using a "teacher-free" method that doesnt require a massive supercomputer. We found a specific version (H-XR150-PW) that provides high accuracy while using very little memory. Our results show that high-performance diagnostic AI can be deployed on standard, low-cost hardware. Furthermore, using visual heatmaps (Grad-CAM), we proved that the AI correctly identifies medical landmarks like lung opacities, ensuring it is safe and reliable for real-world clinical use.

6
Opioids Overdose Death Prediction with Graph Neural Networks

Chen, X.; Gu, Z.; Myers, J.; Kim, J.; Yin, C.; Fareed, N.; Thomas, N.; Fernandez, S.; Zhang, P.

2026-03-20 public and global health 10.64898/2026.03.18.26348454 medRxiv
Top 0.2%
1.3%
Show abstract

The opioid crisis has severely impacted Ohio, with overdose death rates surpassing national averages and disproportionately affecting rural and Appalachian regions. Accurately predicting county-level opioid overdose deaths (OD) is critical for timely intervention but remains challenging due to the wide differences in opioid OD between large and small counties. We propose a Spatial-Temporal Graph Neural Network (ST-GNN) framework that integrates graph neural networks (GNNs) to capture spatial relationships between counties and Long Short-Term Memory (LSTM) networks to model temporal dynamics. Using quarterly OD data from Q1 2017 to Q2 2023 for 88 Ohio counties, we incorporate a nine-dimensional dynamic feature set, including naloxone administration events and high-risk opioid prescribing, along with a static Social Determinants of Health (SDoH) index. Compared to traditional statistical models and temporal deep learning baselines, our ST-GNN demonstrates superior performance, particularly in larger counties, while classification-based strategy improve predictions for small counties, leading to more stable and reliable results. Our findings emphasize the need for spatial-temporal modeling and customized training to enhance public health decision-making in addressing the opioid crisis.

7
Benchmark of biomarker identification and prognostic modeling methods on diverse censored data

Fletcher, W. L.; Sinha, S.

2026-04-01 bioinformatics 10.64898/2026.03.29.715113 medRxiv
Top 0.2%
1.2%
Show abstract

The practices of identifying biomarkers and developing prognostic models using genomic data has become increasingly prevalent. Such data often features characteristics that make these practices difficult, namely high dimensionality, correlations between predictors, and sparsity. Many modern methods have been developed to address these problematic characteristics while performing feature selection and prognostic modeling, but a large-scale comparison of their performances in these tasks on diverse right-censored time to event data (aka survival time data) is much needed. We have compiled many existing methods, including some machine learning methods, several which have performed well in previous benchmarks, primarily for comparison in regards to variable selection capability, and secondarily for survival time prediction on many synthetic datasets with varying levels of sparsity, correlation between predictors, and signal strength of informative predictors. For illustration, we have also performed multiple analyses on a publicly available and widely used cancer cohort from The Cancer Genome Atlas using these methods. We evaluated the methods through extensive simulation studies in terms of the false discovery rate, F1-score, concordance index, Brier score, root mean square error, and computation time. Of the methods compared, CoxBoost and the Adaptive LASSO performed well in all metrics, and the LASSO and elastic net excelled when evaluating concordance index and F1-score. The Benjamini-Hoschberg and q-value procedures showed volatile performances in controlling the false discovery rate. Some methods performances were greatly affected by differences in the data characteristics. With our extensive numerical study, we have identified the best performing methods for a plethora of data characteristics using informative metrics. This will help cancer researchers in choosing the best approach for their needs when working with genomic data.

8
AI-Assisted Pneumonia Detection, Localisation and Report Generation from Chest X-rays

Boiardi, F. E.; Lain, A. D.; Posma, J. M.

2026-03-23 radiology and imaging 10.64898/2026.03.20.26348879 medRxiv
Top 0.3%
1.0%
Show abstract

Pneumonia detection in chest X-rays (CXRs) is complicated by high inter-observer variability and overlapping radiographic patterns. While deep learning (DL) solutions show promise, limitations in generalisability and explainability hinder clinical adoption. We address these challenges by introducing a holistic DL-based computer-aided diagnosis (CAD) pipeline for pneumonia detection, localisation, and structured report generation from CXRs. We curated the largest composite of publicly available CXRs to date (N=922,634), of which [Formula] were used for training. MIMIC-CXR radiology reports were relabelled using a local large language model (LLM), positing that LLM-derived pneumonia labels would yield higher diagnostic sensitivity than the provided rule-based natural language processing (rNLP) labels. DenseNet-121 classifiers were trained on four configurations: MIMIC-CXR (rNLP), MIMIC-CXR (LLM), and each supplemented with VinDr-CXR data. Gradient-weighted Class Activation Mapping (Grad-CAM) provided visual explainability and lung zone-based localisation. LLM-driven relabelling significantly improved human-label agreement (96.5% vs 72.5%, P=1.66x10-11). The best-performing model (MIMIC-CXR (LLM) + VinDr-CXR) achieved 82.08% sensitivity and 81.97% precision, surpassing both radiologist sensitivity ranges (64-77.7%) and CheXNets pneumonia F1-score (43.5%). Grad-CAM localisation attained a moderate F1-score of 52.9% (sensitivity=65.7%, precision=44.3%), confirming focus alignment with pathological lung regions while highlighting areas for refinement. These findings demonstrate that LLM-driven label curation, combined with DL, can exceed conventional rNLP and radiologist performance, advancing high-quality data integration in predictive medical imaging. Clinically, our pipeline offers rapid triage, automated report drafting, and real-time pneumonia surveillance; tools that can streamline radiology workflows and mitigate diagnostic errors.

9
MOE-ECG: Multi-Objective Ensemble Fusion for Robust Atrial Fibrillation Detection Using Electrocardiograms

Peimankar, A.; Hossein Motlagh, N.; K. Khare, S.; Spicher, N.; Dominguez, H.; Abolghasemi, V.; Fujiwara, K.; Teichmann, D.; Rahmani, R.; Puthusserypady, S.

2026-03-30 health informatics 10.64898/2026.03.28.26349522 medRxiv
Top 0.3%
1.0%
Show abstract

Background: Atrial fibrillation (AFib) is the most common sustained arrhythmia in the world, imposing a heavy clinical and economic burden on global healthcare systems. Early detection of AFib can reduce mortality and morbidity, while helping to alleviate the growing economic burden of cardiovascular diseases. With the increasing availability of digital health technologies, computational solutions have great potential to support the timely diagnosis of cardiac abnormalities. Objectives: With the increasing availability of electrocardiogram (ECG) data from clinical and wearable devices, manual interpretation has become impractical due to its time-consuming and subjective nature. Existing automated approaches often rely on single classifiers or fixed ensembles that primarily optimize predictive accuracy while neglecting model diversity, which leads to limited robustness and generalization across heterogeneous datasets. Therefore, this study aims to develop a robust and diversity-aware framework for automatic AFib detection that simultaneously improves classification performance and model generalizability. To this end, we propose MOE-ECG, a multi-objective ensemble selection and fusion framework that explicitly optimizes both predictive performance and inter-model diversity for reliable AFib detection from ECG recordings. Methods: The proposed multi-objective ensemble (MOE) framework uses ensemble selection as a bi-objective optimization problem and employs multi-objective particle swarm optimization to identify complementary classifiers from a heterogeneous model pool. Unlike conventional ensembles, it explicitly optimizes both predictive performance and diversity and integrates Dempster-Shafer theory for uncertainty-aware decision fusion. After filtering the ECG signals to remove baseline wander and noise, they were segmented into windows of 20, 60, and 120 heartbeats with 50% overlap. The proposed approach was evaluated over five independent runs to assess its stability and generalization. Fifteen statistical and nonlinear features were obtained from the RR-intervals of the pre-processed ECG signals, of which eight features were selected with correlation analysis to capture subtle information from the ECG data. We trained and evaluated the performance of the proposed model in three open source databases, namely, the MIT-BIH Atrial Fibrillation Database, Saitama Heart Database Atrial Fibrillation, and Long-Term AF Database. Results: The proposed approach achieved the best overall performance on 60-beat segments, with an average accuracy of 89.85%, precision of 91.14%, recall of 94.19%, an F1-score of 92.64%, and area under the curve (AUC) of around 0.95. Statistical analysis using Holm-adjusted Wilcoxon tests confirmed significant improvements (p<0.05) compared to both the best individual classifier and the unoptimized average ensemble of all classifiers. These findings show that the proposed selection and evaluation methodology, rather than group aggregation alone, is the key driver of performance improvements. Conclusion: The results obtained demonstrate that the MOE-ECG model offers a robust, accurate, and reliable solution for the detection of AFib from short ECG segments. The empirical findings, in general, confirm that multi-objective ensemble fusion enhances diagnostic performance and offers robust predictions that will open up possibilities for real-time AFib detection in clinical and tele-health settings.

10
Improving Glioblastoma Classification Using Quantitative Transport Mapping with a Synthetic Data Trained Deep Neural Network

Romano, D. J.; Roberts, A. G.; Weppner, B.; Zhang, Q.; John, M.; Hu, R.; Sisman, M.; Kovanlikaya, I.; Chiang, G. C.; Spincemaille, P.; Wang, Y.

2026-04-01 radiology and imaging 10.64898/2026.03.31.26349864 medRxiv
Top 0.3%
0.9%
Show abstract

Purpose: To develop a deep neural network-based, AIF-free, perfusion estimation method (QTMnet) for improved performance on glioma classification. Methods: A globally defined arterial input function (AIF) is needed to recover perfusion parameters in the two-compartment exchange model (2CXM). We have developed Quantitative Transport Mapping (QTM) to create an AIF-independent estimation method. QTM estimation can be formulated using deep neural networks trained on synthetic DCE-MRI data (QTMnet). Here, we provide a fluid mechanics-based DCE-MRI simulation with exchange between the capillaries and extravascular extracellular space. We implemented tumor ROI generation to morphologically characterize tissue perfusion. We compared our QTMnet implementation with 2CXM on 30 glioma human subjects, 15 of which had low-grade gliomas, and 15 with high-grade glioblastomas. Results: QTMnet outperforms (best AUC: 0.973) traditional 2CXM (best AUC: 0.911) in a glioma grading task. Conclusion: The AIF-independent QTMnet estimation provides a quantitative delineation between low-grade and high-grade gliomas.

11
A Comparative Study in Surgical AI: Datasets, Foundation Models, and Barriers to Med-AGI

Skobelev, K.; Fithian, E.; Baranovski, Y.; Cook, J.; Angara, S.; Otto, S.; Yi, Z.-F.; Zhu, J.; Donoho, D. A.; Han, X. Y.; Mainkar, N.; Masson-Forsythe, M.

2026-03-28 surgery 10.64898/2026.03.26.26349455 medRxiv
Top 0.3%
0.9%
Show abstract

Recent Artificial Intelligence (AI) models have matched or exceeded human experts in several benchmarks of biomedical task performance, but have lagged behind on surgical image-analysis benchmarks. Since surgery requires integrating disparate tasks --- including multimodal data integration, human interaction, and physical effects --- generally-capable AI models could be particularly attractive as a collaborative tool if performance could be improved. On the one hand, the canonical approach of scaling architecture size and training data is attractive, especially since there are millions of hours of surgical video data generated per year. On the other hand, preparing surgical data for AI training requires significantly higher levels of professional expertise, and training on that data requires expensive computational resources. These trade-offs paint an uncertain picture of whether and to-what-extent modern AI could aid surgical practice. In this paper, we explore this question through a case study of surgical tool detection using state-of-the-art AI methods available in 2026. We demonstrate that even with multi-billion parameter models and extensive training, current Vision Language Models fall short in the seemingly simple task of tool detection in neurosurgery. Additionally, we show scaling experiments indicating that increasing model size and training time only leads to diminishing improvements in relevant performance metrics. Thus, our experiments suggest that current models could still face significant obstacles in surgical use cases. Moreover, some obstacles cannot be simply ``scaled away'' with additional compute and persist across diverse model architectures, raising the question of whether data and label availability are the only limiting factors. We discuss the main contributors to these constraints and advance potential solutions.

12
Pneumonia Detection in Paediatric Chest X-Rays using Ensembled Large Language Models

Tan, J.; Tang, P. H.

2026-04-12 radiology and imaging 10.64898/2026.04.10.26347909 medRxiv
Top 0.3%
0.9%
Show abstract

Background: Paediatric pneumonia is a leading cause of childhood morbidity and mortality worldwide. Chest X-rays (CXR) are an important diagnostic tool in the diagnosis of pneumonia, but shortages in specialist radiology services lead to clinically significant delays in CXR reporting. The ability to communicate findings both to clinicians and laypersons allows MLLMs to be deployed throughout clinical workflows, from image analysis to patient communication. However, MLLMs currently underperform state-of-the-art deep learning classifiers. Objective: To evaluate the diagnostic accuracy of ensemble strategies with MLLMs compared to the baseline average agent for paediatric radiological pneumonia detection. Methods: We conducted a retrospective cohort study using paediatric CXRs from two independent hospital datasets totalling 2300 CXRs. Fifteen MedGemma-4B-it agents independently classified each CXR into five pneumonia likelihood categories. Majority voting, soft voting, and GPTOSS-20B aggregation were compared against the average agent performance. The primary metric evaluated was OvR AUROC. Secondary metrics included accuracy, sensitivity, specificity, F1-score, Cohen's kappa, and OvO AUROC. Results: Soft voting achieved improvements in OvR AUROC (p_balanced = 0.0002, p_real-world = 0.0003), accuracy (p_balanced = 0.0008, p_real-world < 0.0001), Cohen's Kappa (p_balanced = 0.0006, p_real-world = 0.0054) and OvO AUROC (p_balanced < 0.0001, p_real-world = 0.0011) across both datasets, and a superior F1-value (pbalanced = 0.0028) for the balanced dataset. Conclusion: Soft voting enhances MedGemma's diagnostic discriminatory performance for paediatric radiological pneumonia detection. Our system enables privacy-preserving, near real-time clinical decision support with explainable outputs, having potential for integration into emergency departments. Our system's high specificity supports triage by flagging high-risk radiological pneumonia cases.

13
Visual Fidelity-Driven Quality Assessment of Medical Image Translation

Bizjak, Z.; Zagar, J.; Spiclin, Z.

2026-03-20 radiology and imaging 10.64898/2026.03.18.26348721 medRxiv
Top 0.3%
0.9%
Show abstract

Automated and reliable image quality assessment (IQA) is essential for safe use of medical image synthesis in critical applications like adaptive radiotherapy, treatment planning, or missing-modality reconstruction, where unnoticed generative artifacts may adversely affect outcomes. We evaluated image-to-image translation quality by coupling large-scale expert visual quality assessment with explainable automated IQA modeling. Adversarial diffusion-based framework, SynDiff, was applied to four cross-modality synthesis tasks, including three inter-MR and a CBCT-to-CT translation. Using four-fold cross-validation, ten reference-based and eight no-reference IQA metrics were computed for all synthesized images. Visual IQA ratings were independently collected from thirteen expert raters using predetermined protocol and specialized image viewer enabling blinded, randomized six-point Likert scoring. Auto-Sklearn was employed to learn ensemble regression models mapping IQA metrics to visual consensus ratings, with separate models trained on reference-based and no-reference metrics. The models closely reproduced distribution and ordering of expert ratings, typically within +/- 0.5 Likert points. Reference-based models achieved higher agreement with visual ratings than no-reference models (R^2 0.75 vs. 0.59, resp.), although the latter remained unbiased and informative. Explainability analyses highlighted structure- and contrast-sensitive metrics as key predictors. Overall, the results demonstrate that ensemble regression models can provide transparent, scalable, and clinically meaningful quality control for generative medical imaging.

14
New three-dimensional preclinical models to understand and treat liver cancers activated for the β-catenin pathway

Bou Malham, V.; Leandre, F.; Hamimi, A.; Lagoutte, I.; Bouchet, S.; Gougelet, A.; Colnot, S.; Desbois-Mouthon, C.

2026-04-03 cell biology 10.64898/2026.04.01.715868 medRxiv
Top 0.3%
0.8%
Show abstract

Background & aimsConstitutive activation of the {beta}-catenin pathway is a determining feature in the pathogenesis of two primary liver cancers, namely HCC and hepatoblastoma (HB). Activating alterations in CTNNB1 gene and, to a lesser extent, inhibiting alterations in APC gene are observed in 30 to 40% of HCC cases and 80 to 90% of HB cases. For both tumours, therapeutic management is far from optimal. Therefore, relevant experimental models are needed to increase our knowledge and test new therapeutic approaches. MethodsOrganoids and tumouroids were established from APC{Delta}hep and {beta}cat{Delta}ex3 mouse models, which are clinically relevant models for {beta}-catenin-activated HCC and mesenchymal HB. We developed a new methodological approach based on a dynamic suspension culture in a rotating bioreactor. Morphological and molecular characteristics and sensitivity to WNTinib, a treatment already successfully tested on human HCC and HB tumouroids, were evaluated by histology, immunohistochemistry, immunofluorescence, and RT-qPCR. ResultsThis easy-to-implement methodology allows for the rapid generation of a large number of organoids and tumouroids that are uniform in size and show no signs of cell death in their core. The robustness of the methodology is illustrated by the maintenance of the histological architecture, cell diversity and gene expression in organoids and tumouroids in comparison with the native liver tissues. In addition, the value of the HCC-derived tumouroids for evaluating cancer treatment was assessed based on their responsiveness to the {beta}-catenin antagonist WNTinib. ConclusionsThe organoids and tumouroids that we present here are new reliable in vitro cancer models, recapitulating the main features of {beta}-catenin-driven HCC and mesenchymal HB. They can be integrated into an appropriate platform for drug screening and could enable the development of "a la carte" therapies that are urgently needed for these indications. Impact and implicationsThis study addresses the critical need for representative in vitro models to investigate {beta}-catenin-driven liver cancers. The organoids and tumouroids developed here are particularly valuable for researchers seeking robust, reproducible models that accurately reflect the cellular diversity and gene expression profiles of native liver tumours. These findings have practical applications in exploring cancer mechanisms, screening new drugs, optimizing personalized treatment strategies, and reducing reliance on animal models, which ultimately benefits patients. HighlightsO_LIEasy and rapid generation of mouse liver organoids and tumouroids from {beta}-catenin activated tumours using culture in a bioreactor C_LIO_LITumouroids preserve histology, cell diversity, and gene expression of native tissue C_LIO_LIHCC-derived tumouroids respond to {beta}-catenin inhibitor WNTinib C_LIO_LIThese reliable 3D models reduce reliance on animal experiments for drug testing C_LI

15
Development and Pilot Validation of ABHA-O-SHINE: An AI-Ready Oral Health Risk and Insurance Prediction Framework within the Ayushman Bharat Digital Ecosystem

Saxena, Y.; SHRIVASTAVA, L.

2026-04-01 public and global health 10.64898/2026.03.31.26349846 medRxiv
Top 0.3%
0.8%
Show abstract

Background: Oral health remains inadequately integrated within the Ayushman Bharat Digital Mission (ABDM), particularly in terms of structured risk assessment and its linkage to insurance-based decision-making. There is a growing need for scalable models that can connect clinical oral health data with digital health systems and support future artificial intelligence (AI)-driven applications. Aim: To develop and pilot test the ABHA-O-SHINE framework for oral health risk prediction and insurance prioritization, with a future scope for AI integration within the Ayushman Bharat Health Account (ABHA) ecosystem. Materials and Methods: A cross-sectional pilot study was conducted among 126 participants attending the outpatient department of Swargiya Dadasaheb Kalmegh Smruti Dental College and Hospital, Nagpur. Participants were selected based on predefined inclusion and exclusion criteria. Data collection included a structured questionnaire and clinical examination using the WHO Oral Health Assessment Form (2013). A composite risk score (0 to 14) was developed incorporating behavioral and clinical parameters. Participants were categorized into low, moderate, and high-risk groups, and corresponding insurance priority levels were assigned. Statistical analysis included descriptive statistics, Chi-square test, Spearman correlation, and binary logistic regression. Results: The majority of participants were categorized under moderate to high-risk groups. Tobacco use showed a statistically significant association with higher risk levels (p less than 0.05). Positive correlations were observed between total risk score and clinical indicators such as DMFT and CPI. Logistic regression analysis identified tobacco use and clinical scores as significant predictors of high-risk categorization. Conclusion: The ABHA-O-SHINE framework demonstrates feasibility in integrating oral health risk assessment with an insurance prioritization model. The framework is designed to be AI-compatible, enabling future automation through machine learning and image-based analysis within the ABDM ecosystem. Keywords: ABHA, ABDM, Oral Health, Risk Assessment, Insurance, Artificial Intelligence.

16
Improving Medicare Fraud Detection Accuracy in Deep Learning by Exploring Feature Selection and Data Sampling Techniques.

Ahammed, F.

2026-03-20 health informatics 10.64898/2026.03.18.26348763 medRxiv
Top 0.4%
0.7%
Show abstract

Fraud in the health landscape is an aggravating issue, with far-reaching consequences burdening the financial stability of the health industry and threatening the quality of medical care. It results from vulnerabilities within the current healthcare framework that are exploited by the fraudsters in their favor. In spite of many developed models that aim to detect fraudulent patterns in insurance claims, the accuracy of such models frequently suffers as a result of the imbalance issue of the Medicare dataset and irrelevant features. This study ventures to improve detection performance and accuracy by employing a deep learning model along with data sampling and feature selection techniques. Comparative analysis among different combinations is conducted to determine their efficacy to enhance the accuracy of the fraud detection model. Hence, the suggested model clearly demonstrates that a combination of myriad data sampling and feature selection techniques is helping to improve accuracy and performance. The accuracy was thus 95.4%, with negligible evidence of overfitting detected using both Chi-square and Synthetic Minority Over-sampling (SMOTE) techniques. Ultimately, the study findings underscore the significance of employing combined techniques instead of using only the baseline deep learning model for better performance in detecting Medicare insurance fraud.

17
AI-Driven Reconstruction of the Research Paradigm for Phase Separation in Membraneless Organelle

ding, y.; lu, t.; Li, y.

2026-04-02 cell biology 10.64898/2026.03.31.715491 medRxiv
Top 0.6%
0.7%
Show abstract

Liquid-liquid phase separation (LLPS) of biomacromolecules is a key mechanism driving the formation of membraneless organelles (MLOs) within cells, playing a crucial role in fundamental biological processes such as cell proliferation and stress response. Accurately understanding and predicting the phase separation propensity of proteins is essential for unraveling the assembly mechanisms of MLOs and their functions under both physiological and pathological conditions. Traditional research methods primarily rely on biochemical experiments, which are limited by low throughput, high cost, and difficulty in systematically exploring sequence-phase transition relationships. This study proposes and implements a novel three-stage, iterative paradigm based on artificial intelligence (AI) to propel phase separation research towards systematization, predictability, and mechanistic understanding. O_LIBenchmark Model Construction: A preliminary predictive model was established based on a Multilayer Perceptron (MLP) neural network, and the driving effect of phenylalanine/tyrosine (F/Y) residue-mediated {pi}-{pi} interactions on LLPS was validated. C_LIO_LIModel Robustness Enhancement: The model was optimized through adversarial training strategies, which effectively identified and eliminated misclassifications of "highly disordered non-phase-separating" trap sequences. This significantly improved the models generalization capability and reliability when handling complex, real-world sequences. C_LIO_LIPhysical Mechanism Integration and Functional Expansion: Incorporating the Uniform Manifold Approximation and Projection (UMAP) manifold learning method and constraints from non-equilibrium thermodynamics, a "fingerprint space" capable of characterizing the thermodynamic behavior of phase separation was constructed. This space enables cluster analysis of different MLO types, and the model can output a thermodynamic stability score for protein phase separation. Based on this score, we identified 10 high-confidence candidate proteins with the potential to form novel MLOs. The paradigm established in this study upgrades phase separation prediction from the traditional "binary classification" approach to a novel research framework characterized by "physical mechanism analysis + novel MLO discovery." It provides the phase separation field with a computational tool that combines high accuracy, strong robustness, and good physical interpretability. C_LI

18
Med-ICE: Enhancing Factual Accuracy in Medical AI through Autonomous Multi-Agent Consensus

Chen, Z.; Wu, R.; Liu, Y.; Li, R.; Duprey, A.

2026-04-04 health informatics 10.64898/2026.04.02.26350080 medRxiv
Top 0.7%
0.5%
Show abstract

The integration of Large Language Models into high-stakes clinical workflows is critically hampered by their lack of verifiable reliability and tendency to generate hallucinations. This paper introduces Med-ICE, an autonomous framework designed to enhance the reliability of LLMs for medical applications. Med-ICE adapts the Iterative Consensus Ensemble paradigm, enabling a group of peer LLM agents to collaboratively converge on a final answer through iterative rounds of generation and peer review, thereby eliminating the need for an external arbiter and its associated scalability bottleneck. Our work makes three key contributions: (1) a novel semantic consensus mechanism that determines agreement based on semantic similarity, crucial for nuanced clinical language; (2) demonstration of state-of-the-art performance, where Med-ICE significantly outperforms both direct single-LLM generation and the Self-Refinement technique on challenging medical benchmarks; and (3) a highly efficient and scalable architecture, as our Semantic Consensus Monitor is computationally lightweight. This research establishes a new standard for developing safer, more trustworthy LLM systems, paving the way for their responsible integration into medicine.

19
REDDI: A Riemannian Ensemble Learning Framework for Interpretable Differential Diagnosis of Neurodegenerative Diseases

Roca, M.; Messuti, G.; Klepachevskyi, D.; Angiolelli, M.; Bonavita, S.; Trojsi, F.; Demuru, M.; Troisi Lopez, E.; Chevallier, S.; Yger, F.; Saudargiene, A.; Sorrentino, P.; Corsi, M.-C.

2026-04-12 neurology 10.64898/2026.04.10.26350617 medRxiv
Top 0.8%
0.5%
Show abstract

Neurodegenerative diseases such as Mild Cognitive Impairment (MCI), Multiple Sclerosis (MS), Parkinson s Disease (PD), and Amyotrophic Lateral Sclerosis (ALS) are becoming more prevalent. Each of these diseases, despite its specific pathophysiological mechanisms, leads to widespread reorganization of brain activity. However, the corresponding neurophysiological signatures of these changes have been elusive. As a consequence, to date, it is not possible to effectively distinguish these diseases from neurophysiological data alone. This work uses Magnetoencephalography (MEG) resting-state data, combined with interpretable machine learning techniques, to support differential diagnosis. We expand on previous work and design a Riemannian geometry-based classification pipeline. The pipeline is fed with typical connectivity metrics, such as covariance or correlation matrices. To maintain interpretability while reducing feature dimensionality, we introduce a classifier-independent feature selection procedure that uses effect sizes derived from the Kruskal-Wallis test. The ensemble classification pipeline, called REDDI, achieved a mean balanced accuracy of 0.81 (+/-0.04) across five folds, representing a 13% improvement over the state-of-the-art, while remaining clinically transparent. As such, our approach achieves reliable, interpretable, data-driven, operator-independent decision-support tools in Neurology.

20
MedScope: A Lightweight Benchmark of Open-Source Large Language Models for Medical Question Answering

Bian, R.; Cheng, W.

2026-04-01 health informatics 10.64898/2026.03.31.26349827 medRxiv
Top 0.8%
0.5%
Show abstract

The rapid development of large language models (LLMs) has stimulated growing interest in their use for medical question answering and clinical decision support. However, compared with frontier proprietary systems, the empirical understanding of lightweight open-source LLMs in medical settings remains limited, particularly under resource-constrained experimental conditions. To address this gap, we introduce MedScope, a lightweight benchmarking framework for systematically evaluating open-source LLMs on medical multiple-choice question answering. Using 1,000 sampled questions from MedMCQA, we benchmark six lightweight open-source models spanning three representative model families: LLaMA, Qwen, and Gemma. Beyond standard predictive metrics such as accuracy and macro-F1, our framework additionally considers inference time, prediction consistency, subject-wise variability, and model-specific error patterns. We further develop a set of multi-perspective visual analyses, including clustered heatmaps, agreement matrices, Pareto-style trade-off plots, radar charts, and multi-panel summary figures, in order to characterize model behavior in a more interpretable and comprehensive manner. Our results reveal substantial heterogeneity across models in predictive performance, efficiency, and subject-level robustness. While larger lightweight models generally achieve better overall results, the gain is neither uniform across subject categories nor always aligned with efficiency. These findings suggest that lightweight open-source LLMs remain valuable as transparent and reproducible medical AI baselines, but their current capabilities are still insufficient for unsupervised deployment in high-risk healthcare scenarios. MedScope provides an accessible benchmark for evaluating lightweight medical LLMs and emphasizes the need for multi-dimensional assessment beyond accuracy alone.The relevant code is now open-sourced at: https://github.com/VhoCheng/MedScope.